Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates the data scarcity problem in NER by automatically generating training samples. Unfortunately, the distant supervision may induce noisy labels, thus undermining the robustness of the learned models and restricting the practical application. To relieve this problem, recent works adopt self-training teacher-student frameworks to gradually refine the training labels and improve the generalization ability of NER models. However, we argue that the performance of the current self-training frameworks for DS-NER is severely underestimated by their plain designs, including both inadequate student learning and coarse-grained teacher updating. Therefore, in this paper, we make the first attempt to alleviate these issues by proposing: (1) adaptive teacher learning comprised of joint training of two teacher-student networks and considering both consistent and inconsistent predictions between two teachers, thus promoting comprehensive student learning. (2) fine-grained student ensemble that updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise. To verify the effectiveness of our proposed method, we conduct experiments on four DS-NER datasets. The experimental results demonstrate that our method significantly surpasses previous SOTA methods.
translated by 谷歌翻译
时间句子接地(TSG)是多媒体信息检索中的一项重要但具有挑战性的任务。尽管以前的TSG方法已经达到了不错的性能,但它们倾向于捕获数据集中经常出现的视频问题对的选择偏差,而不是呈现强大的多模式推理能力,尤其是对于很少出现的对。在本文中,我们研究了上述选择偏见的问题,并因此提出了一个偏见-TSG(D-TSG)模型,以过滤和消除视觉和语言方式中的负偏见,以增强模型的概括能力。具体来说,我们建议从两个角度来减轻问题:1)特征蒸馏。我们构建了一个多模式的偏见分支,以首先捕获视觉和语言偏见,然后应用一个偏差识别模块以明确识别真正的负偏见并将其从良性多模式表示中删除。 2)对比样品产生。我们构建两种类型的负样本来强制执行模型,以准确学习对齐的多模式语义并做出完整的语义推理。我们将提出的模型应用于通常和很少出现的TSG案例,并通过在三个基准数据集(ActivityNet标题,Tacos和Charades-STA)上实现最先进的性能来证明其有效性。
translated by 谷歌翻译
本文解决了颞句的接地。以前的作品通常通过学习帧级视频功能来解决此任务并将其与文本信息对齐。这些作品的一个主要限制是,由于帧级特征提取,它们未能利用具有微妙的外观差异的模糊视频帧。最近,一些方法采用更快的R-CNN来提取每帧中的详细物体特征来区分细粒的外观相似性。然而,由于对象检测模型缺乏时间建模,因此通过更快的R-CNN提取的对象级别特征遭受缺失的运动分析。为了解决这个问题,我们提出了一种新颖的运动外观推理网络(MARN),其包括动作感知和外观感知对象特征,以更好的原因对象关系来建立连续帧之间的活动。具体而言,我们首先介绍两个单独的视频编码器以将视频嵌入到相应的主导和外观 - 方面对象表示中。然后,我们开发单独的运动和外观分支,以分别学习运动引导和外观引导的对象关系。最后,来自两个分支的运动和外观信息都与用于最终接地的更多代表性的特征相关联。对两个具有挑战性的数据集(Chardes-Sta和Tacos)的广泛实验表明,我们提出的马恩在以前的最先进的方法中大大优于大幅度。
translated by 谷歌翻译
时间句地接地(TSG)是视频理解的关键和基础。虽然现有方法训练具有大量数据的精心设计的深网络,但我们发现他们可以轻松忘记由于偏移数据分布而在训练阶段的很少出现的情况,这影响了模型概括并导致不希望的表现。为了解决这个问题,我们提出了一个内存增强的网络,称为内存引导的语义学习网络(MGSL-net),它学习并记住在TSG任务中的很少出现的内容。具体而言,MGSL-Net由三个主要部件组成:跨模型互动模块,存储器增强模块和异构注意力模块。我们首先将给定的视频查询对与跨模型图卷积网络对齐,然后利用内存模块在域特定的持久存储器中记录跨模板共享语义功能。在培训期间,内存插槽与常见和罕见的案例动态相关,减轻了遗忘问题。在测试中,可以通过检索存储的存储器来提高罕见的情况,从而产生更好的概括。最后,使用异构注意力模块在视频和查询域中集成增强的多模态特征。三个基准测试的实验结果表明了我们对效率和效率的方法的优势,这在整个数据集上显着提高了准确性,而且在罕见的情况下也是如此。
translated by 谷歌翻译
文档级事件提取中有两个主要挑战:1)参数实体分散在不同的句子中,2)事件触发器通常不可用。为了解决这些挑战,最先前的研究主要关注以自回归方式建立参数链,这在培训和推论方面效率低下。与以前的研究相比,我们提出了一种快速轻量级的模型,名为PTPCG。我们设计非自动评级解码算法,以执行修剪的完整图表的事件参数组合提取,这在自动选择的伪触发器的引导下构造。与以前的系统相比,我们的系统实现了资源消耗较低的竞争结果,只需要3.6%的GPU时间(PFS-Days),推断速度快8.5倍。此外,我们的方法显示了具有(或没有)触发器的数据集的卓越兼容性,并且伪触发器可以是注释触发器的补充剂,以进一步改进。
translated by 谷歌翻译
Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.
translated by 谷歌翻译
With the ever-growing model size and the limited availability of labeled training data, transfer learning has become an increasingly popular approach in many science and engineering domains. For classification problems, this work delves into the mystery of transfer learning through an intriguing phenomenon termed neural collapse (NC), where the last-layer features and classifiers of learned deep networks satisfy: (i) the within-class variability of the features collapses to zero, and (ii) the between-class feature means are maximally and equally separated. Through the lens of NC, our findings for transfer learning are the following: (i) when pre-training models, preventing intra-class variability collapse (to a certain extent) better preserves the intrinsic structures of the input data, so that it leads to better model transferability; (ii) when fine-tuning models on downstream tasks, obtaining features with more NC on downstream data results in better test accuracy on the given task. The above results not only demystify many widely used heuristics in model pre-training (e.g., data augmentation, projection head, self-supervised learning), but also leads to more efficient and principled fine-tuning method on downstream tasks that we demonstrate through extensive experimental results.
translated by 谷歌翻译
With increasing privacy concerns on data, recent studies have made significant progress using federated learning (FL) on privacy-sensitive natural language processing (NLP) tasks. Much literature suggests fully fine-tuning pre-trained language models (PLMs) in the FL paradigm can mitigate the data heterogeneity problem and close the performance gap with centralized training. However, large PLMs bring the curse of prohibitive communication overhead and local model adaptation costs for the FL system. To this end, we introduce various parameter-efficient tuning (PETuning) methods into federated learning. Specifically, we provide a holistic empirical study of representative PLMs tuning methods in FL. The experimental results cover the analysis of data heterogeneity levels, data scales, and different FL scenarios. Overall communication overhead can be significantly reduced by locally tuning and globally aggregating lightweight model parameters while maintaining acceptable performance in various FL settings. To facilitate the research of PETuning in FL, we also develop a federated tuning framework FedPETuning, which allows practitioners to exploit different PETuning methods under the FL training paradigm conveniently. The source code is available at \url{https://github.com/iezhuozhuo/FedETuning/tree/deltaTuning}.
translated by 谷歌翻译
Negotiation is one of the crucial abilities in human communication, and there has been a resurgent research interest in negotiation dialogue systems recently, which goal is to empower intelligent agents with such ability that can efficiently help humans resolve conflicts or reach beneficial agreements. Although there have been many explorations in negotiation dialogue systems, a systematic review of this task has to date remained notably absent. To this end, we aim to fill this gap by reviewing contemporary studies in the emerging field of negotiation dialogue systems, covering benchmarks, evaluations, and methodologies. Furthermore, we also discuss potential future directions, including multi-modal, multi-party, and cross-cultural negotiation scenarios. Our goal is to provide the community with a systematic overview of negotiation dialogue systems and to inspire future research.
translated by 谷歌翻译
Federated learning achieves joint training of deep models by connecting decentralized data sources, which can significantly mitigate the risk of privacy leakage. However, in a more general case, the distributions of labels among clients are different, called ``label distribution skew''. Directly applying conventional federated learning without consideration of label distribution skew issue significantly hurts the performance of the global model. To this end, we propose a novel federated learning method, named FedMGD, to alleviate the performance degradation caused by the label distribution skew issue. It introduces a global Generative Adversarial Network to model the global data distribution without access to local datasets, so the global model can be trained using the global information of data distribution without privacy leakage. The experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art on several public benchmarks. Code is available at \url{https://github.com/Sheng-T/FedMGD}.
translated by 谷歌翻译